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(54) Video coding method of exploiting the temporal redundancy between successive frames 



(57) The invention relates to a video coding method 
of exploiting the temporal redundancy between succes- 
sive frames in a video sequence. A reference frame, 
called l-frame, is first approximated by a collection of 
geometric features, called atoms. The following predict- 
ed frames called, P-f rames, are approximated by the ge- 
ometric transformations of the geometric features (at- 
oms) describing the previous frame. Preferably, the I- 
frame is approximated by a linear combination of N at- 
oms 



selected in a redundant, structured library. They are In- 
dexed by a string of parameters representing the ge- 
ometric transformations applied to the generating moth- 
er function g(x,y) and the c^ are weighting coefficients. 
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Description 

[0001] The present Invention relates to a video coding method of exploiting the temporal redundancy between suc- 
cessive frames in a video sequence. 

[0002] Efficiently encoding video or moving pictures relies heavily on exploiting the temporal redundancy between 
successive frames in a sequence. Based on the assumption that local motions are slow with respect to the temporal 
sampling period, several techniques have been proposed for efficiently removing this redundancy. The most successful 
and acclaimed method is block-based motion prediction, heavily used In nowadays standards such as MPEG4 and H. 
26L. Roughly speaking, these compression schemes predict a frame in the sequence based on the knowledge of 
previous frames. The current frame (or predicted) is cut into blocks of fixed size and the best matching block is searched 
In the reference frame. Displacement vectors are then encoded so that the decoder can reconstruct the prediction of 
the current frame from the previously decoded frame(s). As the block-based prediction is not accurate enough to 
encode perfectly the current frame, the error between the original and the predicted frame is encoded separately. This 
is in general referred to as texture coding or motion residual coding. The main drawback of this method lies in the 
blocky nature of the prediction mechanism, which gives rise to very noticeable blocky artefacts at low bit rates. Moreover 
such a systerti, while well suited for wide translational motions, is unable to cope with locally complex movements or 
even global geometric transformations such as zoom or rotation. Finally, block based motion prediction is not able to 
follow natural features of Images since it is stuck in a fixed framework based on artificial image primitives (blocks). 
[0003] This invention proposes to solve the above-mentioned problems by introducing a new paradigm for dealing 
with spatio-temporal redundancy. 

[0004] The method according the invention is defined In claim 1 . In the dependent claims various embodiments are 
proposed. 

[0005] The invention will be described with the help of accompanying representations. 

Figure 1 : shows the progressive reconstruction of an I frame with 50, 100. 150 and 200 atoms. 

Figure 2 : shows a flow-chart of the l-frames codec, where the atoms are used to transmit the frame. 

Figure 3: shows a flow-chart of the l-frames codec, where the atoms are estimated from the coded image both at 
the ericoder and decoder 

Figure 4 : shows three successive schematic updates of basis functions (atoms) inside a sequence of frames 

Figure 5 : shows a flow-chart of the P-frames coder, using atom based motion prediction. 

Figure 6 and 7: show the encoding of the Foreman sequence: l-frame, predicted frames #20, 50 and 99. 

Figure 8: represents PSNR along the encoded Foreman sequence as afunction of the number of predicted frames. 

[0006] The proposed method is composed of two main parts. In a first step, a geometric model of a reference frame 
is built. In the second step, the model is updated by deformations in order to match successive frames. 
[0007] A reference frame (or intra frame, l-frame) is first decomposed into a linear combination of basis functions 
(atoms) selected in a redundant, structured library (see. P. Vandergheynst and P. Frossard, Efficient image represen- 
tation by anisotropic refinement in matching pursuit, In Proceedings of IEEE ICASSP, Salt Lake City UT, May 2001, 
vol. 3. the content of which is Incorporated herein by reference) 



[0008] In this equation, !(x,y) is the Intensity of the l-frame represented as a function giving the gray level value of 
the pixel at position (x,y). c„ are weighting coefficients and (x, y) are the atoms involved in the decomposition. These 
atoms are image patches generated by a simple mathematical formula expressing the gray level value at pixel position 
(x,y). The fonnula is built by applying geometric transfomiations to a function g(x,y) that we call a generating mother 
function. The parameters of these transf onnations are concatenated in the vector of parameters y. Examples of possible 
transfomnations are translations, rotations or dilations. They act on the generating mother function by change of van- 



able, for example: 
Translations: 
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Dilation: 



9B(^.y) = a"' g(xla,yia) 



[0009] Translations, anisotropic dilations and rotations: 




^(^n»>'«)3where 



cosS(x-b,)-sinS(y-b2) 
a, 

sind(x — b,)+cosd(y-b2) ' 



and r=[ai,a2,bi,b2,i^l, are the parameters of this transformation. 

[0010] Generating mother functions are chosen almost arbitrarily. Their properties can be adapted to the specific 
application. A possible example Is to select an oscillating function of the fonn: 



[0011] The decomposition can for example be accomplished using a Matching Pursuit (see S. Mailat and Z. Zhang, 
Matching Pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, 41(12):3397-3415, De- 
cember 1993, the content of which is incorporated herein by reference). Matching Pursuit (MP) Is a greedy algorithm 
that iteratively decomposes the image using the following scheme. First the atom that best matches the image is 
searched by maximizing the scalar product between the image and the dictionary (llgA. and a residual image is com- 
puted: 




[0012] Then the same process is applied to the residual: 



and Iteratively: 



[0013] Finally this yields a decomposition of the image in temns of a sum of atoms: 





EP 1 435 740 A1 



lW = i:(Rn|gx.)gr.W-*-RNW. 



[0014] The basis functions (atonns) are Indexed by a string of parameters representing geometric transfomnations 
applied to a generating mother function g(x,y). This index can be seen as a point on a manifold. The set of geometric 
transfomnations is designed in such a way that the total collection of basis functions (atonns) is a dense subspace of 
L2(R2), i.e. any image can be exactly represented. 

[0015] This part of the method expresses the l-frame as a collection of atoms that can be seen as geometric features 
such as edges or parts of objects which are very noticeable by the human eye. These basic primitives hereafter referred 
to as atoms, form a primal sketch of the Image. The atoms are modelled and fully represented by the set of coefficients 
and parameters {Cn, y^, n = 0,..., N-1}, where c^ is the coefficient and y^, is a vector of parameters. 
[0016] There are two ways to handle the coding of the l-frames. The first one is to estimate the atoms of the original 
frame. The atoms modelling the 1 frame are then quantized, entropy coded and sent in the bitstream. The process of 
quantizing an atom corresponds to the quantization of its coefficient and parameters. The atoms are also stored in 
memory in order to be used for the prediction of the next frames. A flowchart of this procedure is shown in Figure 2, 
where the atoms are used to transmit the frame. Dotted lines represent the coefficients and parameters of the atoms, 
the bold solid line represents a video frame and the light solid line represents bitstreams. The flow-chart should be 
understood as follow: The input frame is decomposed into atom using the MP algorithm. The atoms are then quantized 
and entropy coded. They are also de-quantized and stored in memory for the prediction of the next frames. Those de- 
quantized atom parameters are used to reconstruct the image, with the help of the generating mother function (which 
is known both by the encoder and decoder). Finally the difference between the original frame and the reconstructed 
one is computed and encoded using the frame coder. 

[0017] Note that the figure Includes an optional step, that encodes the motion residuals (or texture), which is the 
difference between the original frame and the one reconstructed using the atoms. This encoding can be used to further 
increase the quality of the decoded image up to a lossless reconstruction. 

[001 8] The second way of handling the l-frames is more conventional. The original frame Is encoded and transmitted 
using any frame codec. Then the atoms are estimated from the reconstructed frame, both at the encoder and at the 
decoder Finally those atoms are stored in memory for the prediction of future frames. The flowchart of this procedure 
is shown in Figure 3, where the atoms are estimated from the coded image both at the encoder and decoder. Dotted 
lines represent the coefficients and parameters of the atoms, the bold solid line represents a video frame and the light 
solid line represents bitstreams. The flow-chart should be understood as follow: The input frame Is first encoded with 
the frame coder and serid to the decoder in a bitstream. It is also decoded, in order to get the same frame that will be 
available at the decoder. This frame is decomposed into atoms by the MP algorithm. Finally those atoms are stored in 
memory to be used for the prediction of the next frames. 

[0019] The second step of the method consists in updating the image model (the set of ail atoms) in order to take 
into account the geometric defonnatlons that have occurred between the reference and the current frame. Clearly, 
since the model is based on geometric transformations, updating its atoms allows for adapting to smooth local distor- 
tions (translations, rotations, scaiings are common examples). In order to compute this update, we assume that the 
deformed model is close enough to the reference model. We thus have to search for new atom parameters in the 
proximity of the previous solution. This is performed by means of a local optimisation procedure trying to minimize the 
mean square error between the updated model and the current frame (Figure 4, where three successive schematic 
updates of basis functions (atoms) inside a sequence of frames are represented). The updated atom parameters are 
then the solution of: 



where the optimization method for frame If at time t is initialised with the atom parameters corresponding to the solution 
at time t -1 or to the reference frame (the I frame) in order to avoid error propagation. This problem Is a non-convex, 
nonlinear, differentiable optimisation problem (see Dimltri P. Bertsekas (1999) Nonlinear Programming: 2nd Edition. 
Athena Scientifk; h'ttp://www.athenasc.conn/nonllnbook.html, the content of which is incorporated herein by reference). 
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Which can be solved using various algorithms such as quasl-Newton methods, combined with line-search or trust- 
region globalisation techniques (see Conn A., Gould N. & Joint Ph. (2000) Trust Region Methods. SIAM. httpy/www. 
fundp.ac.be/— phtoint/phtArbook.html, the content of which is incorporated herein by reference), In order to identify a 
local optimum. 

[0020] The difference between the original atom parameters and the updated ones is then computed and sent to- 
gether with updated atom coefficients. Quantization and entropy coding can then be perfomned (P. Frossard, P. Van- 
dergheynst and R.M, Figueras y Ventura, Redundancy driven a posteriori matching pursuit quantization. ITS Technical 
Report 2002, the content of which is incorporated herein by reference ) and a bltstream generated. This procedure is 
shown in detail in the flow-chart of Figure 5. which represents a flow-chart of the P-frames coder, using atom based 
motion prediction. Dotted lines represent the coefficients and parameters of the atoms, the bold solid line represents 
a video frame and the light solid line represents bitstreams. The flow-chart should be understood as follow: The input 
frame are passed to the parameter estimation, which will modify the atom parameters stored in memory In order for 
them to describe correctly the input The difference between the new atom parameters and the one stored in memory 
is computed and the result is quantized and entropy coded. Those quantized difference are also de-quantized and 
added to the atom parameters previously stored in memory. This allows the reconstruction of the same atoms as the 
one available at the decoder. Those reconstmcted atom parameters are then stored in memory, replacing the ones of 
the previous frame. They are also used to reconstruct the current frame using the generating mother function available 
at the encoder and decoder. The difference between this reconstruction and the original input is computed and encoded 
using the frame coder. The number of updated atoms and their quantization can be fixed but can also be chosen in 
adaptive manner through rate and distortion constraints by means of a rate controller. For each motion predicted frame, 
the motion residuals (or texture) can be computed and encoded using any still image codec. It would then be sent in 
the bitstream in order to generate a scalable stream achieving lossy to lossless compression. 
[0021] Typical results of the motion prediction, with a coding cost of 170 Kbps in average are shown in Figure 6, 
where predicted frames 20, 50 and 99 are represented. It can be seen that the motion prediction stays accurate even 
after 1 GO frames, even in the absence of encoding of the motion residual. This is the major advantage of this technique 
compared to block based compensation. In order to show the advantage of the new prediction technique, we have 
done the same experiment with typical block matching compensation. The same sequence was encoded using adaptive 
block compensation with block sizes of 4x4 to 32x32. The encoder automatically selected the best block size according 
to Rate-Distortion optimisation. By varying the block size it is possible to control the size of the compressed bltstream 
in order to match the result of the atom based motion prediction. Like in the case of the atom based prediction, the 
motion residual where not coded. The Motion prediction results of the block matching are shown in Figure 7. 
Objective comparison can be done using the PSNR measure. This measure Is computed using the squared difference 
between the original and the reconstructed 



40 



frame, i.e. PSNR = - lOlog 



255' 



, where I(x,y) is the original 
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50 



frame and \j(x,y) is the reconstructed frame. The PSNR comparisons of the two prediction methods, for a bitstream of 
average size of 170 Kbps, are shown in Figure 8. The perfomnance of the current atom based motion prediction is 
partfcularly good taken Into consideration, that the states of the art block based motion prediction was used. Moreover, 
it can be seen tiiat In the long term (more than 50 frames) the prediction of the atom based method Is more constant. 
In typical block matching applications, the block size is limited to 8x8, which causes poorer perfomnances for the Block 
Matching. 

[0022] The number of predicted frames can be fixed but can also be chosen in adaptive manner through rate and 
distortion constraints by means of a rate controller. For instance, when abrupt changes occur (shots between scenes), 
the model is not an accurate base anymore and the reference l-frame can simply be refreshed in order to compute a 
new model. Such a refresh mechanism can be monitored by tracking frame-to-frame distortion, which should stay 
rather constant for smooth updates of the initial model (Figure 8). 
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Claims 



1. Video coding method of exploiting the temporal redundancy between successive frames In a video sequence 



IS 
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characterized In that a reference frame, called l-f rame, is first approximated by a collection of geometric features, 
called atoms, and that the following predicted frames called, P-f rames, are approximated by the geometric trans- 
formations of the geometric features (atoms) describing the previous frame. 

2. Video coding method according to claim 1 , characterized in that the l-f rame Is approximated by a linear combi- 
nation of N atoms g^Jix,)/): 



selected In a redundant, structured library and indexed by a string of parameters representing the geometric 
transformations applied to the generating mother function g(x,y) and the Cp are weighting coefficients. 

3. Video coding method according to claim 2, characterized in that the atoms occurring in the decomposition are 
chosen using the Matching Pursuit algorithm. 

4. Video coding method according to one of the claims 1 to 3, characterized in that the parameters and coefficients 
20 of the atoms are quantized and entropy coded. 

5. Video coding method according the claims 4, characterized in that the quantization of the pareuneters and the 
coefficients can vary across time, and that the variation is controlled by a rate control unit. 

2S 6. Video coding method according to one of the claims 1 to 5, characterized in that the system is used as a motion 
prediction, and that the differences between the original frames and the ones reconstructed using the atoms, called 
the residual Images, are encoded using another frame based codec. 

7. Video coding method according to one of the claims 1 to 6, characterized in that the geometric features (atoms) 
30 of the l-f rame are computed from the quantized frames at the encoder and decoder and are not transmitted. 

8. Video coding method according to one of the claims 1 to 7, characterized In that the geometric features (atoms) 
are re-computed after each quantized frame at the encoder and decoder and replace the previous prediction. 

55 9. Video coding method according to one of the claims 1 to 8, characterized in that the geometric transformations 
used to build the library are composed of translations, anisotropic dilations and rotations, applied to a generating 
mother function g(x,y) by means of the following change of variables: 

where 

45 

cos"d(x-b^)-slnd(y-b2) 



50 



sini&(x-b ^ )+cosT^(y-b2) 

y — 

®2 
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10. Video coding method according to one of the claims 1 to 9, characterized in that the generating mother function 
is of the following form: 
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